Study on Preprocessing and Classifying Mass Spectral Raw Data Concerning Human Normal and Disease Cases
نویسندگان
چکیده
Mass spectrometry is becoming an important tool in biological sciences. Tissue samples or easily obtained biological fluids (serum, plasma, urine) are analysed by a variety of mass spectrometry methods, producing spectra characterized by very high dimensionality and a high level of noise. Here we address a feature exraction method for mass spectra which consists of two main steps : In the first step an algorithm for low level preprocessing of mass spectra is applied, including denoising with the Shift-Invariant Discrete Wavelet Transform (SIDWT), smoothing, baseline correction, peak detection and normalization of the resulting peak-lists. After this step, we claim to have reduced dimensionality and redundancy of the initial mass spectra representation while keeping all the meaningful features (potential biomarkers) required for disease related proteomic patterns to be identified. In the second step, the peak-lists are alligned and fed to a Support Vector Machine (SVM) which classifies the mass spectra. This procedure was applied to SELDIQqTOF spectral data collected from normal and ovarian cancer serum samples. The classification performance was assessed for distinct values of the parameters involved in the feature extraction pipeline. The method described here for low-level preprocessing of mass spectra results in 98.3% sensitivity, 98.3% specificity and an AUC (Area Under Curve) of 0.981 in spectra classification.
منابع مشابه
Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملSynthesis and spectral characterization of naphthyldihydrazones derived from some 1,3- dicarbonyl compounds and their Ni(II), Cu(II) and Zn(II) complexes
The coupling of tetrazotised 1,8-diaminonaphthalene with 1,3-dicarbonyl compounds [acetylacetone,methylacetoacetate and acetoacetanilide] yielded a new series of tetradentate ligand systems. Analytical, IR,1H NMR and mass spectral data indicate that the compounds exist in the intramolecularly hydrogen bondeddihydrazone form. Dibasic tetradentate N2O2 coordination of these compounds in their [ML...
متن کاملTheoretical study of structure spectral properties of Tacrine as Alzheimer drug
Tacrine (9-amino-1,2,3,4-tetrahydroacridine) as a reversible inhibitor of acetylcholinesterase (AChE),was the first drug for the symptomatic treatment of Alzheimer’s disease (AD). NMR structuredetermination still presents some considerable challenges: the method is limited to systems ofrelatively small molecular mass, data collection times are long, data analysis remains a lengthyprocedure, and...
متن کاملDiscrimination of Human Cell Lines by Infrared Spectroscopy and Mathematical Modeling
Variations in biochemical features are extensive among cells. Identification of marker that is specific for each cell is essential for following the differentiation of stem cell and metastatic growing. Fourier transform infrared spectroscopy (FTIR) as a biochemical analysis more focused on diagnosis of cancerous cells. In this study, commercially obtained cell lines such as Human ovarian carcin...
متن کاملDiscrimination of Human Cell Lines by Infrared Spectroscopy and Mathematical Modeling
Variations in biochemical features are extensive among cells. Identification of marker that is specific for each cell is essential for following the differentiation of stem cell and metastatic growing. Fourier transform infrared spectroscopy (FTIR) as a biochemical analysis more focused on diagnosis of cancerous cells. In this study, commercially obtained cell lines such as Human ovarian carcin...
متن کامل